Gene Onoltogy Analysis on day25 Heterozygous

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.SE_deseq2_HT.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.DEAList_HT.rds - Class: character"
## [1] "Parameter: HT  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/day25/Output/Savings/day25CbO.deseqHTvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day25/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/day25/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_day25_HT - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
library(data.table)
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$HT$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$HT$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

17781 genes in 13 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 3346 differentially expressed genes
  • FDR < 0.05: 2653 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 1511 differentially expressed genes
  • FDR < 0.05 and FC > 2: 916 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 2653 genes are selected: 2300 up-regulated genes and 1931 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 1035 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 17584 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: ggrepel: 128 unlabeled data points (too many overlaps). Consider increasing max.overlaps


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 2653 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 17781 genes
  • modulated genes: 1583 genes
  • down-regulated genes: 592 genes of interest
  • up-regulated genes: 991 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 1583 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannHT <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannHT, ontology='BP', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 6783 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  12 nodes to be scored   (31 eliminated genes)
## 
##   Level 16:  20 nodes to be scored   (49 eliminated genes)
## 
##   Level 15:  58 nodes to be scored   (139 eliminated genes)
## 
##   Level 14:  108 nodes to be scored  (275 eliminated genes)
## 
##   Level 13:  186 nodes to be scored  (671 eliminated genes)
## 
##   Level 12:  316 nodes to be scored  (1547 eliminated genes)
## 
##   Level 11:  571 nodes to be scored  (3743 eliminated genes)
## 
##   Level 10:  855 nodes to be scored  (5869 eliminated genes)
## 
##   Level 9:   1025 nodes to be scored (7579 eliminated genes)
## 
##   Level 8:   1013 nodes to be scored (9576 eliminated genes)
## 
##   Level 7:   967 nodes to be scored  (11149 eliminated genes)
## 
##   Level 6:   781 nodes to be scored  (12294 eliminated genes)
## 
##   Level 5:   478 nodes to be scored  (13077 eliminated genes)
## 
##   Level 4:   252 nodes to be scored  (13602 eliminated genes)
## 
##   Level 3:   109 nodes to be scored  (13781 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13875 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13924 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 592 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannHT, ontology='BP', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownHT', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 4939 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  6 nodes to be scored    (18 eliminated genes)
## 
##   Level 16:  11 nodes to be scored   (25 eliminated genes)
## 
##   Level 15:  31 nodes to be scored   (83 eliminated genes)
## 
##   Level 14:  55 nodes to be scored   (184 eliminated genes)
## 
##   Level 13:  106 nodes to be scored  (506 eliminated genes)
## 
##   Level 12:  194 nodes to be scored  (1270 eliminated genes)
## 
##   Level 11:  321 nodes to be scored  (3430 eliminated genes)
## 
##   Level 10:  559 nodes to be scored  (5529 eliminated genes)
## 
##   Level 9:   748 nodes to be scored  (7046 eliminated genes)
## 
##   Level 8:   757 nodes to be scored  (9231 eliminated genes)
## 
##   Level 7:   751 nodes to be scored  (10939 eliminated genes)
## 
##   Level 6:   634 nodes to be scored  (12164 eliminated genes)
## 
##   Level 5:   423 nodes to be scored  (12973 eliminated genes)
## 
##   Level 4:   216 nodes to be scored  (13583 eliminated genes)
## 
##   Level 3:   102 nodes to be scored  (13777 eliminated genes)
## 
##   Level 2:   21 nodes to be scored   (13871 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13923 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownHT$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 991 genes

ResBPUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannHT, ontology='BP', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11815 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15469 GO terms and 35188 relations. )
## 
## Annotating nodes ...............
##  ( 14087 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 5932 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  10 nodes to be scored   (31 eliminated genes)
## 
##   Level 16:  19 nodes to be scored   (48 eliminated genes)
## 
##   Level 15:  53 nodes to be scored   (132 eliminated genes)
## 
##   Level 14:  99 nodes to be scored   (255 eliminated genes)
## 
##   Level 13:  161 nodes to be scored  (642 eliminated genes)
## 
##   Level 12:  265 nodes to be scored  (1497 eliminated genes)
## 
##   Level 11:  490 nodes to be scored  (3575 eliminated genes)
## 
##   Level 10:  719 nodes to be scored  (5536 eliminated genes)
## 
##   Level 9:   867 nodes to be scored  (7264 eliminated genes)
## 
##   Level 8:   881 nodes to be scored  (9243 eliminated genes)
## 
##   Level 7:   857 nodes to be scored  (10970 eliminated genes)
## 
##   Level 6:   702 nodes to be scored  (12211 eliminated genes)
## 
##   Level 5:   435 nodes to be scored  (13009 eliminated genes)
## 
##   Level 4:   240 nodes to be scored  (13593 eliminated genes)
## 
##   Level 3:   103 nodes to be scored  (13779 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13874 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13923 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllHT$ResSel, TopGOResDown=ResBPDownHT$ResSel, TopGOResUp = ResBPUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllHT$ResSel, TopGOResDown = ResBPDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 1583 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannHT <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannHT, ontology='MF', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1226 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  18 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  25 nodes to be scored   (34 eliminated genes)
## 
##   Level 9:   61 nodes to be scored   (230 eliminated genes)
## 
##   Level 8:   111 nodes to be scored  (1394 eliminated genes)
## 
##   Level 7:   194 nodes to be scored  (3543 eliminated genes)
## 
##   Level 6:   245 nodes to be scored  (4377 eliminated genes)
## 
##   Level 5:   279 nodes to be scored  (6255 eliminated genes)
## 
##   Level 4:   216 nodes to be scored  (9285 eliminated genes)
## 
##   Level 3:   57 nodes to be scored   (11722 eliminated genes)
## 
##   Level 2:   14 nodes to be scored   (12486 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14371 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 592 genes

ResMFDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannHT, ontology='MF', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 876 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  10 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  20 nodes to be scored   (21 eliminated genes)
## 
##   Level 9:   41 nodes to be scored   (165 eliminated genes)
## 
##   Level 8:   71 nodes to be scored   (1364 eliminated genes)
## 
##   Level 7:   132 nodes to be scored  (3444 eliminated genes)
## 
##   Level 6:   169 nodes to be scored  (4159 eliminated genes)
## 
##   Level 5:   203 nodes to be scored  (5847 eliminated genes)
## 
##   Level 4:   164 nodes to be scored  (8984 eliminated genes)
## 
##   Level 3:   50 nodes to be scored   (11545 eliminated genes)
## 
##   Level 2:   13 nodes to be scored   (12391 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14369 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownHT$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 991 genes

ResMFUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannHT, ontology='MF', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4195 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4656 GO terms and 6011 relations. )
## 
## Annotating nodes ...............
##  ( 14512 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 993 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  12 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  18 nodes to be scored   (28 eliminated genes)
## 
##   Level 9:   42 nodes to be scored   (179 eliminated genes)
## 
##   Level 8:   85 nodes to be scored   (1356 eliminated genes)
## 
##   Level 7:   144 nodes to be scored  (3452 eliminated genes)
## 
##   Level 6:   192 nodes to be scored  (4209 eliminated genes)
## 
##   Level 5:   235 nodes to be scored  (5987 eliminated genes)
## 
##   Level 4:   190 nodes to be scored  (8956 eliminated genes)
## 
##   Level 3:   56 nodes to be scored   (11582 eliminated genes)
## 
##   Level 2:   14 nodes to be scored   (12415 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14356 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllHT$ResSel, TopGOResDown=ResMFDownHT$ResSel, TopGOResUp=ResMFUpHT$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllHT$ResSel, TopGOResDown = ResMFDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 1583 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannHT <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannHT, ontology='CC', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 695 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  31 nodes to be scored   (47 eliminated genes)
## 
##   Level 10:  70 nodes to be scored   (63 eliminated genes)
## 
##   Level 9:   100 nodes to be scored  (817 eliminated genes)
## 
##   Level 8:   108 nodes to be scored  (2672 eliminated genes)
## 
##   Level 7:   115 nodes to be scored  (5077 eliminated genes)
## 
##   Level 6:   100 nodes to be scored  (8882 eliminated genes)
## 
##   Level 5:   72 nodes to be scored   (10679 eliminated genes)
## 
##   Level 4:   48 nodes to be scored   (12804 eliminated genes)
## 
##   Level 3:   42 nodes to be scored   (14127 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14555 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14702 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 592 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannHT, ontology='CC', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 560 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  2 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  23 nodes to be scored   (47 eliminated genes)
## 
##   Level 10:  57 nodes to be scored   (55 eliminated genes)
## 
##   Level 9:   75 nodes to be scored   (730 eliminated genes)
## 
##   Level 8:   82 nodes to be scored   (2560 eliminated genes)
## 
##   Level 7:   92 nodes to be scored   (4792 eliminated genes)
## 
##   Level 6:   82 nodes to be scored   (8771 eliminated genes)
## 
##   Level 5:   64 nodes to be scored   (10641 eliminated genes)
## 
##   Level 4:   40 nodes to be scored   (12777 eliminated genes)
## 
##   Level 3:   38 nodes to be scored   (14124 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14553 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14702 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownHT$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 991 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannHT, ontology='CC', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1707 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1926 GO terms and 3236 relations. )
## 
## Annotating nodes ...............
##  ( 14776 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 591 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  26 nodes to be scored   (35 eliminated genes)
## 
##   Level 10:  53 nodes to be scored   (55 eliminated genes)
## 
##   Level 9:   84 nodes to be scored   (774 eliminated genes)
## 
##   Level 8:   87 nodes to be scored   (2513 eliminated genes)
## 
##   Level 7:   100 nodes to be scored  (4980 eliminated genes)
## 
##   Level 6:   90 nodes to be scored   (8744 eliminated genes)
## 
##   Level 5:   64 nodes to be scored   (10657 eliminated genes)
## 
##   Level 4:   39 nodes to be scored   (12791 eliminated genes)
## 
##   Level 3:   41 nodes to be scored   (14126 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14552 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14702 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllHT$ResSel, TopGOResDown=ResCCDownHT$ResSel, TopGOResUp=ResCCUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllHT$ResSel, TopGOResDown = ResCCDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
ResSelBP_HT <- list(ResSelAll = ResBPAllHT$ResSel,
                    ResSelUp = ResBPUpHT$ResSel,
                    ResSelDown = ResBPDownHT$ResSel)

ResSelMF_HT <- list(ResSelAll = ResMFAllHT$ResSel,
                    ResSelUp = ResMFUpHT$ResSel,
                    ResSelDown = ResMFDownHT$ResSel)

ResSelCC_HT <- list(ResSelAll = ResCCAllHT$ResSel,
                    ResSelUp = ResCCUpHT$ResSel,
                    ResSelDown = ResCCDownHT$ResSel)

ResSel_HT <- list(BP = ResSelBP_HT,
                  MF = ResSelMF_HT,
                  CC = ResSelCC_HT)

saveRDS(ResSel_HT, paste0(SavingFolder, '/day25CbO.', 'ResSel_HT.rds'))
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/day25CbO.', 'FunctionalAnalysisWorkspace_HT.RData'))

last update on: Fri Jan 10 20:49:01 2025